Authors:
This brings us to the crux of the issue. Even just looking at the current extent of Sage Grouse habitat, it is clear that even in reduced numbers, sagebrush still covers a highly dispersed area in the Intermountain West. Developing effective conservation practices requires monitoring more land than is practical through traditional means. A 2015 study by Doherty et al., using 30$m^{2}$ and 1$km^{2}$ resolution remote imagery, found that the accuracy of these images to model habitat was highly dependent on scale due to localized variation in sage grouse habitats. We will be exploring whether higher spatial resolution remote imagery can accurately model vegetation structure and composition in these areas, and if so, which scales are most informative.
For our project, we selected two NEON sites representative of the sagebrush ecosystem with a range of openly available remote sensing data. We chose the National Ecological Observatory Network's (NEON's) Central Plains Experimental Range (CPER) in Weld County, Colorado, and the Onaqui field site (ONAQ) in the Onaqui Mountains of Nevada. They contain a large amount of sagebrush habitat. See section 2.4 for site details.
This notebook uses a variety of python packages to ensure reproducibility across operating systems, collect data, structure and visualize data, perform calculations, and map sites.
# Import packages
import os
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import urllib
import folium
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import geopandas as gpd
from shapely.geometry import shape
from shapely.geometry import Polygon
from shapely.geometry import box
from shapely.geometry import Point
import rasterio as rio
from rasterio.mask import mask
from rasterio.plot import plotting_extent
import rasterstats as rs
import earthpy as et
import earthpy.plot as ep
# Set working directory
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
We used functions for all of our preprocessing work. We accessed NEON's insitu and Canopy Height Model (CHM) datasets via API calls, created shapefiles from dataframes, cross-referenced CHM and insitu coordinates, and called only raster CHM tiles we needed for vegetation structure comparison using custom functions. Finally, we compared the accuracy of stem heights between insitu data plots (40$m^{2}$) and Canopy Height Models (CHM, 1$km^{2}$, with 1$m$ resolution). In order to compare the accuracy of insitu collected vegetation structure data with high resolution CHM data, we plotted the maximum and median of both sets of data (although, we also calculated the minimum and mean).
def open_ecosystem_structure(site, date):
'''Uses API call to retrieve NEON ecosystem structure (CHM)
data at a given site and date. Returns list of all rasters
within data product. For more information on NEON ecosystem
structure data and a full list of available dates see
https://data.neonscience.org/data-products/DP3.30015.001
Parameters
----------
site : str
4 Letter site name. See
https://www.neonscience.org/field-sites/field-sites-map/list
for a full list of NEON sites
date : str
Date of data collection in yyyy-mm format
Returns
-------
CHM_raster_tiles : .tif
All raster .tif tiles associated with
the site and date specified
'''
data_product_url = ['https://data.neonscience.org/api/v0/data/DP3.30015.001/'
+ site+'/'+date]
call_response = requests.get(data_product_url[0])
call_response.json()
CHM_raster_tiles = []
for i in call_response.json()['data']['files']:
data_file_url = i['url']
file_format = data_file_url.find('.tif')
if not file_format == -1:
CHM_raster_tiles.append(data_file_url)
return CHM_raster_tiles
def open_woody_veg_structure(site, date):
'''Uses API call to retrieve NEON product data for woody
vegetation structure. Returns pandas of merged apparent
individual, mapping and tagging, and per plot per year
documents, eg one dataframe with locational, species,
and height data. Also returns a pandas dataframe of filtered
plot data to facilitate geospatial merges and calculation of
raster stats. For more information on NEON woody vegetation
structure data products and available dates, see
https://data.neonscience.org/data-products/DP1.10098.001
Parameters
----------
site : str
4 Letter site name. See
https://www.neonscience.org/field-sites/field-sites-map/list
for a full list of NEON sites
date : str
Date of data collection in yyyy-mm format
Returns
-------
all_merged_df : pandas.core.frame.DataFrame
Pandas dataframe of merged measurement, plot, and mapping
tabular files from data product
plot_df : pandas.core.frame.DataFrame
Pandas dataframe of perplotperyear.csv locational data
'''
data_product_url = ['https://data.neonscience.org/api/v0/data/DP1.10098.001/'
+ site+'/'+date]
call_response = requests.get(data_product_url[0])
all_urls = []
for i in call_response.json()['data']['files']:
data_file_url = i['url']
height_find = data_file_url.find('individual')
plot_find = data_file_url.find('perplot')
map_find = data_file_url.find('mapping')
if not height_find == -1:
apparent_df = pd.read_csv(data_file_url)
elif not plot_find == -1:
plot_df = pd.read_csv(data_file_url)
elif not map_find == -1:
map_df = pd.read_csv(data_file_url)
apparent_df = apparent_df[[
'plotID', 'individualID', 'height']]
plot_df = plot_df[['plotID', 'plotType',
'decimalLatitude', 'decimalLongitude',
'easting', 'northing']]
map_df = map_df[['plotID', 'individualID', 'scientificName']]
measurement_map_merge = pd.merge(
apparent_df, map_df, on=['plotID', 'individualID'])
all_merged_df = pd.merge(plot_df, measurement_map_merge, on='plotID')
return all_merged_df, plot_df
def NEON_site_extent(path_to_NEON_boundaries, site):
'''Extracts a NEON site extent from an individual site as
long as the original NEON site extent shape file contains
a column named 'siteID'.
Parameters
----------
path_to_NEON_boundaries : str
The path to a shape file that contains the list
of all NEON site extents, also known as field
sampling boundaries (can be found at NEON and
ESRI sites)
site : str
One siteID contains 4 capital letters,
e.g. CPER, HARV, ONAQ or SJER.
Returns
-------
site_boundary : geopandas.geodataframe.GeoDataFrame
A vector containing a single polygon
per the site specified.
'''
NEON_boundaries = gpd.read_file(path_to_NEON_boundaries)
boundaries_indexed = NEON_boundaries.set_index(['siteID'])
site_boundary = boundaries_indexed.loc[[site]]
site_boundary.reset_index(inplace=True)
return site_boundary
def buffer_point_plots(df, crs, buffer):
'''Creates geodataframe from plot points
within a designated coordinate reference system.
Buffers plot points to a given radius. Compatible
with most NEON tabular plot data files including
northing and easting locational columns. Final product
can be used to visualize plot locations or combined
with other spatial data products.
Parameters
----------
df : pandas.core.frame.DataFrame
df including Northing and Easting plot locations
crs : str or rasterio.crs.CRS
String of desired coordinate reference system
buffer : int
Desired radius for final plot polygons
Returns
-------
buffered_gdf : geopandas.geodataframe.GeoDataFrame
Dataframe with point plots buffered to polgyons
'''
buffered_gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(
x=df.easting, y=df.northing), crs=crs)
buffered_gdf['geometry'] = buffered_gdf.geometry.buffer(buffer)
return(buffered_gdf)
def tiles_over_insitu_plots(tiles, plots):
'''Takes a list of raster images and geodataframe
of plot polygons within the same CRS. Cross references
overlap between raster extent polygon and plot point
polygons. Returns list of .tiff file locations that
overlap completely with plot polygons.
----------
tiles : list
List of rasters
plots : geopandas.geodataframe.GeoDataFrame
Geodataframe with polygons of AOI plots
Returns
-------
target_rasters : list
List of strings with to raster locations
'''
target_rasters = []
insitu_plot_size = plots.loc[0, 'geometry'].area
for tile in tiles:
with rio.open(tile) as src:
extent = plotting_extent(src)
raster_polygon = Polygon([
[extent[0], extent[2]],
[extent[0], extent[3]],
[extent[1], extent[3]],
[extent[1], extent[2]]])
raster_polygon_gdf = gpd.GeoDataFrame(crs=src.crs,
geometry=[raster_polygon])
raster_plot_intersection = gpd.overlay(
raster_polygon_gdf, plots, how='intersection')
if raster_plot_intersection['geometry'].empty:
pass
elif int(
raster_plot_intersection.loc[0, 'geometry'].area) == int(
insitu_plot_size):
target_rasters.append(tile)
return target_rasters
def calculate_rasterstats_dataframe(tiles, plot_polygons):
'''Creates a geodataframe object with lidar summary statistics
using rasterstats zonal. Requires a pandas dataframe with plot
polygons to cross reference with lidar calculations. Outputs a
single dataframe with uniquely named summary statistics.
Parameters
----------
df : pandas.core.frame.DataFrame
df including Northing and Easting plot locations
crs : str or rasterio.crs.CRS
String of desired coordinate reference system or
rasterio CRS object
Returns
-------
CHM_stats : list of geopandas.geodataframe.GeoDataFrame
Returns a list of geodataframes with lidar max, lidar mean,
lidar median and lidar min calculated in new columns.
calculations.
'''
CHM_stats = []
for tile in tiles:
with rio.open(tile) as chm_src:
site_chm = chm_src.read(1, masked=True)
site_chm_meta = chm_src.meta
site_tree_heights = rs.zonal_stats(
plot_polygons,
site_chm,
affine=site_chm_meta["transform"],
geojson_out=True,
copy_properties=True,
nodata=0,
stats=["mean", "median", "max", "min"])
site_tree_heights_gdf = gpd.GeoDataFrame.from_features(
site_tree_heights)
rename_dict_lidar = {"mean": "lidar_mean",
"median": "lidar_median",
"max": "lidar_max",
"min": "lidar_min"}
site_tree_heights_gdf.rename(columns=rename_dict_lidar, inplace=True)
CHM_stats.append(site_tree_heights_gdf)
return CHM_stats
All data for this project is openly available through the National Ecological Observatory Network (NEON). The following Data Products are were used:
# Download shapefile of all NEON site boundaries
url = 'https://www.neonscience.org/neon-terrestrial-field-site-boundaries-shapefile'
et.data.get_data(url=url, replace=True)
# Create path to shapefile
terrestrial_sites = os.path.join(
'data', 'earthpy-downloads',
'fieldSamplingBoundaries (1)',
'terrestrialSamplingBoundaries.shp')
# Import insitu plot data for CPER and ONAQ sites
CPER_insitu_df, CPER_plots = open_woody_veg_structure(
site='CPER', date='2017-09')
ONAQ_insitu_df, ONAQ_plots = open_woody_veg_structure(
site='ONAQ', date='2017-09')
# Import CHM data and identify crs
CPER_tif_files = open_ecosystem_structure(
site='CPER', date='2017-05')
with rio.open(CPER_tif_files[0]) as CPER_src:
CPER_crs = CPER_src.crs
ONAQ_tif_files = open_ecosystem_structure(
site='ONAQ', date='2017-06')
with rio.open(ONAQ_tif_files[0]) as ONAQ_src:
ONAQ_crs = ONAQ_src.crs
# Create geodataframes with buffered plot points
CPER_insitu_gdf = buffer_point_plots(
df=CPER_insitu_df, crs=CPER_crs, buffer=40)
ONAQ_insitu_gdf = buffer_point_plots(
df=ONAQ_insitu_df, crs=ONAQ_crs, buffer=40)
CPER is a 6,300 hectare NEON site located inside Pawnee National Grasslands in Weld County, Colorado. It is a mixed sagebrush/shortgrass steppe ecosystem that is fragmented by cattle grazing and oil and gas extraction. See below for an interactive map with the CPER site and plot points outlined.

# Get shapefile of CPER site extent
CPER_site_outline = NEON_site_extent(
path_to_NEON_boundaries=terrestrial_sites,
site='CPER')
# Convert plot locations and site outline to JSONs in correct CRS
CPER_plot_locations_json = CPER_insitu_gdf.to_crs(epsg=4326).to_json()
CPER_site_outline_json = CPER_site_outline.to_crs(epsg=4326).to_json()
# Create scalable map with CPER site location and plot points
CPER_map = folium.Map([40.78, -104.7456],
zoom_start=10,
tiles='Stamen Toner')
CPER_points = folium.features.GeoJson(
CPER_plot_locations_json, tooltip='CPER Site')
CPER_site = folium.features.GeoJson(CPER_site_outline_json)
CPER_map.add_child(CPER_site).add_child(CPER_points)
CPER_map
The Onaqui Mountains, southwest of Salt Lake City, are home to the Onaqui Wild Horses. It is a sagebrush steppe ecosystem that is periodically grazed by livestock and fragmented by cheatgrass. See below for an interactive map with the NEON ONAQ site and plot locations overlayed.

# Import shapefile of NEON site
ONAQ_site_outline = NEON_site_extent(
path_to_NEON_boundaries=terrestrial_sites, site='ONAQ')
# Convert site outline and plot points to JSONs
ONAQ_plot_locations_json = ONAQ_insitu_gdf.to_crs(epsg=4326).to_json()
ONAQ_site_outline_json = ONAQ_site_outline.to_crs(epsg=4326).to_json()
# Create interactive map with ONAQ site outline and plot points
ONAQ_map = folium.Map([40.17759, -112.45244],
zoom_start=10,
tiles='Stamen Toner')
ONAQ_points = folium.features.GeoJson(
ONAQ_plot_locations_json, tooltip='ONAQ Site')
ONAQ_site = folium.features.GeoJson(ONAQ_site_outline_json)
ONAQ_map.add_child(ONAQ_site).add_child(ONAQ_points)
ONAQ_map
# Create aoi list overlapping CHM tiles with insitu plots
# using function
CPER_AOI_tifs = tiles_over_insitu_plots(
tiles=CPER_tif_files, plots=CPER_insitu_gdf)
ONAQ_AOI_tifs = tiles_over_insitu_plots(
tiles=ONAQ_tif_files, plots=ONAQ_insitu_gdf)
# Open only the CHM tiles in aoi
with rio.open(CPER_AOI_tifs[0]) as CPER_src:
CPER_CHM = CPER_src.read(1, masked=True)
CPER_extent = plotting_extent(CPER_src)
with rio.open(ONAQ_AOI_tifs[0]) as ONAQ_src:
ONAQ_CHM = ONAQ_src.read(1, masked=True)
ONAQ_extent = plotting_extent(ONAQ_src)
Turns out that our focal species, Artemsia tridentata, also known as Great Basin Sagebrush or Big Sagebrush, is actually not that big. In addition to not-so-big Big Sagebrush, these ecosystems are often characterized by grasses and other short vegetation. Unfortunately, this vegetation is too short to be preserved in NEON CHMs since NEON discards all measurements less than 2 meters.
# Visualizing height distribution at focal sites
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
fig.suptitle('Histograms of Insitu Height Frequency',
x=0.5, y=1.0, fontsize=14)
CPER_insitu_gdf['height'].plot.hist(
ax=axes[0], color=['palegoldenrod'],
title=('CPER May 2017'))
ONAQ_insitu_gdf['height'].plot.hist(
ax=axes[1], color=['burlywood'],
title='ONAQ June 2017')
for i in axes:
i.set_xlabel('Stem Heights (m)')
i.set_yticks([100, 200, 300, 400])
axes[0].text(0.5, -0.165, 'Data Source: NEON',
ha='center', transform=axes[0].transAxes,
fontsize=10)
axes[1].text(1.71, -0.165, 'Data Source: NEON',
ha='center', transform=axes[0].transAxes,
fontsize=10)
plt.show()
# Visualizing CHM aoi's from the CPER and ONAQ sites
#####ADD MARKDOWN FOR THESE #####
### UNITS ###
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 12))
fig.suptitle(
'Plots of Canopy Height Model Rasters Over Sagebrush Ecosystems',
x=0.5, y=0.72, fontsize=14)
ep.plot_bands(CPER_CHM, cmap='BrBG', ax=ax1,
title='CHM of CPER May 2017',
scale=False)
ax1.text(0.5, -0.075, 'Data Source: NEON',
ha='center', transform=ax1.transAxes, fontsize=10)
ep.plot_bands(ONAQ_CHM, cmap='BrBG', ax=ax2,
title='CHM of ONAQ June 2017',
scale=False)
ax2.text(0.5, -0.075, 'Data Source: NEON',
ha='center', transform=ax2.transAxes, fontsize=10)
plt.show()
As you can see, the maximum height of all stems in either CPER or ONAQ is less than 2m, as measured by hand. This is visualized in the CHM raster plots which show no gradation within the 0-2 meter range. Therefore, all of our lidar data would show zero similarity to the insitu measurement results (all lidar stem heights would be 'None', reflecting all stems <2m being dropped). However, with our reproducible code, we are able to check to see if other sites are able to produce desired results, validating our workflow.
NEON's Great Smokey Mountains site (GRSM) in Tennessee (seen in the Figure with all NEON field sites map) is a deciduous and evergreen forest with a wide variety of stem heights. We used the same preprocessing and statistics approach to compare the insitu plots and CHM rasters. The heterogeneous vegetation will result in CHM data >2m, thereby allowing us to explore the accuracy of the 1m resolution CHM data with insitu measured data more clearly.
# Get raster data and source crs
GRSM_tif_files = open_ecosystem_structure(
site='GRSM', date='2017-10')
with rio.open(GRSM_tif_files[0]) as GRSM_src:
GRSM_crs = GRSM_src.crs
# Get insitu data and create geodataframe for insitu
GRSM_insitu_df, GRSM_plots = open_woody_veg_structure(
site='GRSM', date='2017-10')
# Create plot geodataframe with plot data only for rs.zonal
GRSM_plot_gdf = buffer_point_plots(
df=GRSM_plots, crs=GRSM_crs, buffer=40)
#Create plot geodataframe with height data for histogram
GRSM_insitu_gdf = buffer_point_plots(
df=GRSM_insitu_df, crs=GRSM_crs, buffer=40)
GRSM_AOI_tifs = tiles_over_insitu_plots(
tiles=GRSM_tif_files, plots=GRSM_plot_gdf)
with rio.open(GRSM_AOI_tifs[0]) as GRSM_src:
GRSM_CHM = GRSM_src.read(1, masked=True)
GRSM_extent = plotting_extent(GRSM_src)
# ADD PLOTS
fig, axes = plt.subplots(1, 2, figsize=(12, 7))
fig.suptitle('Exploring GRSM CHM and Stem Heights')
ep.plot_bands(GRSM_CHM, cmap='BrBG', ax=axes[0],
title='CHM of GRSM May, 2017',
scale=False)
GRSM_insitu_gdf['height'].plot.hist(
ax=axes[1], color=['darkseagreen'],
title='Distribution of Stem Hieghts (m)\n'
'at GRSM Oct, 2017')
axes[1].set_xlabel('Stem Heights (m)')
axes[1].set_yticks([25, 50, 75, 100, 125])
axes[0].text(0.155, -0.1, 'Data Source: NEON',
ha='center', transform=axes[0].transAxes,
fontsize=10)
plt.show()
# Use function including raster stats to calculate CHM statistics
GRSM_tree_heights_gdf = calculate_rasterstats_dataframe(
tiles=GRSM_AOI_tifs, plot_polygons=GRSM_plot_gdf)
# For this site, only one lidar tile overlaps
# This extracts the focal tile
GRSM_tree_heights_gdf = GRSM_tree_heights_gdf[0]
# Create summary statistics for insitu measurements
GRSM_insitu_stats = GRSM_insitu_gdf.groupby("plotID").agg([
"mean", "median", "max", "min"])["height"]
# Rename max/mean columns to include 'insitu'
GRSM_insitu_stats.rename(columns={"mean": "insitu_mean",
"median": "insitu_median",
"max": "insitu_max",
"min": "insitu_min"},
inplace=True)
# # Merge insitu mean/max df with the lidar df
GRSM_CHM_insitu = GRSM_tree_heights_gdf.merge(
GRSM_insitu_stats,
left_on="plotID",
right_on="plotID")
# Plot 1 & 2 - GRSM max and mean
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 7))
fig.suptitle('Great Smokey Mountains, TN, October 2017'
'\n Calculated Lidar vs. Insitu Stem Heights',
fontsize=14)
# Add diagonal 1:1 and regression lines
ax1.plot((0, 1), (0, 1), transform=ax1.transAxes, ls='--', c='k')
sns.regplot(GRSM_CHM_insitu.lidar_max,
GRSM_CHM_insitu.insitu_max,
ax=ax1)
ax2.plot((0, 1), (0, 1), transform=ax2.transAxes, ls='--', c='k')
sns.regplot(GRSM_CHM_insitu.lidar_median,
GRSM_CHM_insitu.insitu_median, ax=ax2)
# Set title and labels for axes
ax1.set(xlabel="Max Lidar Stem Height (m)",
ylabel="Max Insitu Stem Height (m)",
xlim=(0, 50), ylim=(0, 50),
title='GRSM Max Height Comparison')
ax2.set(xlabel="Median Lidar Stem Height (m)",
ylabel="Median Insitu Stem Height (m)",
xlim=(0, 50), ylim=(0, 50),
title='GRSM Median Height Comparison')
ax1.text(0.155, -0.15, 'Data Source: NEON',
ha='center', transform=ax1.transAxes, fontsize=10)
plt.show()
The comparisons between GRSM maximum and median stem heights using insitu and lidar measurements are mixed and limited. It appears that there may be a strong relationship between the maximum insitu and lidar height measurements, but a poor association between the medians. The 4 plots that had 100% overlapping lidar tiles, with stems greater than 2m, are not strong enough results to assess the 1m resolution lidar data at this stage. However, further exploration is needed to judge lidar's usefulness in machine learning analysis.
While our present results are currently unable to contribute to our goals of assessing lidar data's accuracy of modeling sagebrush vegetation, we do know that we have a workflow for assessing CHM data in the future and that lidar has potential to contribute to assessing habitats in larger scales. There is a lot of work left to do.
Our pilot project produced code that will work, once we find/create appropriate CHM data. However, our goals are to analyze the accuracy of a range of remote sensing products at different resolutions that can best describe sage grouse habitat at larger scales. Looking forward, minimally we would like to continue to explore CPER and ONAQ sites by:
Ideally, our next steps will also include incorporating resource or habitat modeling for sage grouse and/or using our code to reproduce results at other sites outside of NEON (e.g. Western Wyoming), but this will largely depend on time.
Although our current report was not able to derive the results we needed to assess lidar data's accuracy modeling sagebrush habitat, it does provide us with a workflow that will carry us forward. After we create our own CHM data, we will not only be able to assess lidar's ability to measure stems that are <2m in height, we will also be able to continue our quest in assessing other remote sensing instruments ability to characterize vegetation structure and composition in these environments. The overall project will create a workflow that helps wildlife and land managers prioritize sage grouse lekking sites, therefore many plants and animal habitats, with data that cannot possibly all be measured in person.